Goto

Collaborating Authors

 meta objective


Reviews: Metalearned Neural Memory

Neural Information Processing Systems

UPDATED I think the authors for their rebuttal comments. All my concerns have been addressed (modulo seeing the extra results / error bars) so I am raising my score to 8. The idea of parameterising the memory as a neural network, and using ideas from metalearning to quickly train it to produce a specified output for new sequences, is very interesting and novel. The paper is overall well written, and I believe should be reproducable by those familiar with metalearning approaches. The justification for the model is interesting - essentially instead of writing some values to a fixed size memory, and then reads being limited to a convex combination of the written values, using a neural network allows potential benefits with compression, as well as generalisation, with constant space. Obviously the key issue with this is whether the memory function can be easily modified in one shot so that a new set of keys and values will be'read' approximately correctly.


Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Kachuee, Mohammad, Lee, Sungjin

arXiv.org Artificial Intelligence

Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate.


Four Deep Learning Papers to Read in January 2022

#artificialintelligence

Welcome to the January edition of the ‚Machine-Learning-Collage' series, where I provide an overview of the different Deep Learning research streams. So what is a ML collage? Simply put, I draft one-slide visual summaries of one of my favourite recent papers. At the end of the month all of the resulting visual collages are collected in a summary blog post. Thereby, I hope to give you a visual and intuitive deep dive into some of the coolest trends.